Skip to content

CI: mark window online test slow #41971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 15, 2021
Merged

Conversation

mzeitlin11
Copy link
Member

Seems like the new online tests added increase test runtime a lot, making failures like https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=61437&view=logs&j=404760ec-14d3-5d48-e580-13034792878f&t=f81e4cc8-d61a-5fb8-36be-36768e5c561a and
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=61437&view=logs&j=404760ec-14d3-5d48-e580-13034792878f&t=f81e4cc8-d61a-5fb8-36be-36768e5c561a
more common. Slowest durations showing this are below, have marked these as slow as a potential solution

============================ slowest 30 durations =============================
99.91s call     pandas/tests/resample/test_base.py::test_resample_interpolate[period_range-pi-_index_start1-_index_end1]
47.56s call     pandas/tests/resample/test_period_index.py::TestPeriodIndex::test_weekly_upsample[end-D-FRI]
47.36s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-True-obj1]
45.55s teardown pandas/tests/window/test_rolling.py::test_rolling_zero_window
30.25s call     pandas/tests/tslibs/test_timezones.py::test_cache_keys_are_distinct_for_pytz_vs_dateutil[America/Argentina/San_Luis]
28.37s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-True-obj0]
23.03s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-True-obj1]
23.00s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-True-obj1]
20.95s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-True-obj0]
20.30s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-True-obj0]
19.94s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-False-obj1]
19.81s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-False-obj0]
19.37s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-False-obj1]
19.25s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-False-obj0]
18.13s call     pandas/tests/test_sorting.py::TestSorting::test_int64_overflow_moar
18.10s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-False-obj0]
18.07s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-False-obj1]
17.17s call     pandas/tests/window/test_numba.py::TestEngine::test_numba_vs_cython_apply[True-False-False-False-True]
17.16s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-True-obj1]
16.84s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-False-True-obj1]
16.75s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-True-obj0]
16.48s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-True-True-obj1]
16.47s setup    pandas/tests/io/test_fsspec.py::test_from_s3_csv
16.36s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-True-True-obj1]
15.93s call     pandas/tests/io/test_compression.py::test_with_missing_lzma
15.83s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-False-True-obj1]
15.13s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-False-obj1]
14.46s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-True-True-obj0]
14.14s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-False-obj0]
14.00s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-True-False-obj0]

@mzeitlin11 mzeitlin11 added CI Continuous Integration Testing pandas testing functions or related to the test suite labels Jun 12, 2021
@jreback
Copy link
Contributor

jreback commented Jun 12, 2021

cc @mroeschke

@mroeschke
Copy link
Member

@stuartarchibald any pointers on how I could speed up our numba tests?

Generally I have a parameterized test like this:

def test_online_vs_non_online_mean(

Where I am parameterizing over nogil=True|False, parallel=True|False, nopython=True|False (among other parameters)

A typical numba function that is being jitted under the hood is like this:

def online_ewma(

The size of the data in the tests is fairly small, but jitting a new version of these functions for each parameter combination seems to take quite a bit of time.

@jreback
Copy link
Contributor

jreback commented Jun 12, 2021

@mroeschke prob ok to just have a couple of predefined numba options rather than doing them all (as not testing numba per se here)

@stuartarchibald
Copy link

@mroeschke @jreback I agree, I think cutting down the options is probably a good idea. Numba's test suite is massive and independently tests these features (and permutations of) so it's probably best in view of runtime to focus on testing the things that are important to these Pandas features.

  • The nogil option is just switching on/off the GIL, it just impacts code gen and can probably be ignored unless you are specifically testing to assert that the GIL is not present?
  • The parallel option is probably worth testing as it's one of the most complex transforms in Numba.
  • The nopython=True option permits the "compile to run without the cpython API or fail" behaviour. I think you probably always want this as these functions really need the performance and should always be compiled to run without the cpython API?

If you agree with the above, I think it would cut the test space down to nogil=False (default), nopython=True(default), parallel=True/False, i.e. just testing the variation in the parallel kwarg.

@jbrockmendel
Copy link
Member

if there isn't a better fix on the horizon i think marking as slow is reasonable. This test accounts for about a quarter of the runtime with --skip-slow locally

@mroeschke
Copy link
Member

Thanks for the insight @stuartarchibald. I'll make those changes in a follow up PR.

As for this PR, I think marking as slow is still a good idea as @jbrockmendel mentioned

@jreback jreback added this to the 1.3 milestone Jun 15, 2021
@jreback jreback merged commit 6f18ef6 into pandas-dev:master Jun 15, 2021
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 15, 2021
@jreback
Copy link
Contributor

jreback commented Jun 15, 2021

thanks @mzeitlin11

@mroeschke ok to PR the limiting of the numba combinations as @stuartarchibald

@mzeitlin11 mzeitlin11 deleted the online_slow branch June 15, 2021 04:52
simonjayhawkins pushed a commit that referenced this pull request Jun 15, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants